Search CORE

238 research outputs found

DYNAMO-MAS: a multi-agent system for ontology evolution from text

Author: N Aussenac-Gilles
P Cimiano
P Cimiano
S Aubin
S Lemouzy
V Tamma
VI Levenshtein
ZS Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2013
Field of study

International audienceManual ontology development and evolution are complex and time-consuming tasks, even when textual documents are used as knowledge sources in addition to human expertise or existing ontologies. Processing natural language in text produces huge amounts of linguistic data that need to be filtered out and structured. To support both of these tasks, we have developed DYNAMO-MAS, an interactive tool based on an adaptive multi-agent system (adaptive MAS or AMAS) that builds and evolves ontologies from text. DYNA-MO-MAS is a partner system to build ontologies; the ontologist interacts with the system to validate or modify its outputs. This paper presents the architecture of DYNAMO-MAS, its operating principles and its evaluation on three case studies

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Using Social Media to Promote STEM Education: Matching College Students with Role Models

Author: A Bandura
A Bandura
AB Heldman
C Merrill
CJ Owen
D Karunanayake
DM Blei
EA Ensher
G Salton
HV Emmerik
JE Lydon
K Weber
L Tsui
M Hall
P Jaccard
P Lockwood
S Metz
TA Judge
VI Levenshtein
Publication venue
Publication date: 01/07/2016
Field of study

STEM (Science, Technology, Engineering, and Mathematics) fields have become increasingly central to U.S. economic competitiveness and growth. The shortage in the STEM workforce has brought promoting STEM education upfront. The rapid growth of social media usage provides a unique opportunity to predict users' real-life identities and interests from online texts and photos. In this paper, we propose an innovative approach by leveraging social media to promote STEM education: matching Twitter college student users with diverse LinkedIn STEM professionals using a ranking algorithm based on the similarities of their demographics and interests. We share the belief that increasing STEM presence in the form of introducing career role models who share similar interests and demographics will inspire students to develop interests in STEM related fields and emulate their models. Our evaluation on 2,000 real college students demonstrated the accuracy of our ranking algorithm. We also design a novel implementation that recommends matched role models to the students.Comment: 16 pages, 8 figures, accepted by ECML/PKDD 2016, Industrial Trac

arXiv.org e-Print Archive

Crossref

Improving translation memory matching and retrieval using paraphrases

Author: Constantin Orăsan
EK Whyman
GA Miller
H Somers
Josef van Genabith
Marcos Zampieri
Mihaela Vela
P Langlais
Rohit Gupta
Ruslan Mitkov
VI Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2016
Field of study

This is an accepted manuscript of an article published by Springer Nature in Machine Translation on 02/11/2016, available online: https://doi.org/10.1007/s10590-016-9180-0 The accepted version of the publication may differ from the final published version.Most of the current Translation Memory (TM) systems work on string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance calculated on surface-form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing in the edit-distance metric. The approach computes edit-distance while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that paraphrasing substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs

Crossref

Wolverhampton Intellectual Repository and E-theses

The first Automatic Translation Memory Cleaning Shared Task

Author: Carla Parra Escartín
Constantin Orasan
Eduard Barbu
F Pedregosa
J Tiedemann
LI Kuncheva
Luisa Bentivogli
Marcello Federico
Marco Turchi
Matteo Negri
Q McNemar
VI Levenshtein
WA Gale
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/12/2016
Field of study

This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Wolverhampton Intellectual Repository and E-theses

Penalty-Based Aggregation of Strings

Author: B Ma
FJ Damerau
H Bustince
JK Lanctot
M Li
P Jaccard
PC Fishburn
RR Yager
RW Hamming
SH Owen
T Calvo
T Kohonen
VI Levenshtein
Publication venue
Publication date: 01/01/2019
Field of study

International Summer School on Aggregation Operators (2019. Olomouc, Czech Republic

Crossref

Repositorio Institucional de la Universidad de Oviedo

Ghent University Academic Bibliography

Fifty years of spellchecking

Author: Blair CR
Brooks G
Carlson AJ
Cucerzan S
Damerau FJ
Damerau FJ
Golding AR
Golding AR
Leech G
Levenshtein VI
McIlroy MD
Mihov S
Mitton R
Mitton R
Mitton R
Mitton R
Morris R
Oflazer K
Pedler J
Peterson JL
Peterson JL
Pollock JL
Roger Mitton
Savary A
Sterling CM
Veronis J
Wagner RA
Wing AM
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

A short history of spellchecking from the late 1950s to the present day, describing its development through dictionary lookup, affix stripping, correction, confusion sets, and edit distance to the use of gigantic databases

Crossref

Birkbeck Institutional Research Online

Classifying Candidate Axioms via Dimensionality Reduction Techniques

Author: A Maedche
AGB Tettamanzi
AGB Tettamanzi
B Schölkopf
D Dubois
D Fleischhacker
D Sacha
F Pedregosa
H Yin
I Huitzil
K Pearson
L Bühmann
LG Nonato
LVD Maaten
TH Nguyen
TW Anderson
VI Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We assess the role of similarity measures and learning methods in classifying candidate axioms for automated schema induction through kernel-based learning algorithms. The evaluation is based on (i) three different similarity measures between axioms, and (ii) two alternative dimensionality reduction techniques to check the extent to which the considered similarities allow to separate true axioms from false axioms. The result of the dimensionality reduction process is subsequently fed to several learning algorithms, comparing the accuracy of all combinations of similarity, dimensionality reduction technique, and classification method. As a result, it is observed that it is not necessary to use sophisticated semantics-based similarity measures to obtain accurate predictions, and furthermore that classification performance only marginally depends on the choice of the learning method. Our results open the way to implementing efficient surrogate models for axiom scoring to speed up ontology learning and schema induction methods

Crossref

AIR Universita degli studi di Milano

INRIA a CCSD electronic archive server

Reconstructing Words from Right-Bounded-Block Words

Author: AWM Dress
B Manvel
B Manvel
G Rozenberg
I Krasikov
I Simon
J Berstel
L van Iersel
M Dudik
M Lothaire
M Rigo
PJ Kelly
PL Erdős
PV O’Neil
TH Cormen
V Berthé
VI Levenshtein
Publication venue
Publication date: 01/01/2020
Field of study

A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word

w \in \{a, b\}^{*}

can be reconstructed from the number of occurrences of at most

\min(|w|_a, |w|_b)+ 1

scattered factors of the form

a^{i} b

. Moreover, we generalize the result to alphabets of the form

\{1,\ldots,q\}

by showing that at most

\sum^{q-1}_{i=1} |w|_i (q-i+1)

scattered factors suffices to reconstruct

w

. Both results improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

The Settlement of Madagascar: What Dialects and Languages Can Tell Us

Author: A Adelaar
A Adelaar
D D'Urville
Enrico Scalas
F Petroni
M Serva
M Serva
M Serva
M Swadesh
Maurizio Serva
OC Dahl
P Vérin
Ph Blanchard
RD Gray
RD Gray
RM Blench
SJ Greenhill
VI Levenshtein
Publication venue: Public Library of Science
Publication date: 21/07/2011
Field of study

The dialects of Madagascar belong to the Greater Barito East group of the Austronesian family and it is widely accepted that the Island was colonized by Indonesian sailors after a maritime trek that probably took place around 650 CE. The language most closely related to Malagasy dialects is Maanyan, but Malay is also strongly related especially for navigation terms. Since the Maanyan Dayaks live along the Barito river in Kalimantan (Borneo) and they do not possess the necessary skill for long maritime navigation, they were probably brought as subordinates by Malay sailors. In a recent paper we compared 23 different Malagasy dialects in order to determine the time and the landing area of the first colonization. In this research we use new data and new methods to confirm that the landing took place on the south-east coast of the Island. Furthermore, we are able to state here that colonization probably consisted of a single founding event rather than multiple settlements.To reach our goal we find out the internal kinship relations among all the 23 Malagasy dialects and we also find out the relations of the 23 dialects to Malay and Maanyan. The method used is an automated version of the lexicostatistic approach. The data from Madagascar were collected by the author at the beginning of 2010 and consist of Swadesh lists of 200 items for 23 dialects covering all areas of the Island. The lists for Maanyan and Malay were obtained from a published dataset integrated with the author's interviews

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Contextual and Behavioral Customer Journey Discovery Using a Genetic Approach

Author: A Gabadinho
B Vázquez-Barreiros
G Bernard
İ Gürvardar
KN Lemon
S Peltola
T Caliński
VI Levenshtein
WMP Aalst van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

With the advent of new technologies and the increase in customers’ expectations, services are becoming more complex. This complexity calls for new methods to understand, analyze, and improve service delivery. Summarizing customers’ experience using representative journeys that are displayed on a Customer Journey Map (CJM) is one of these techniques. We propose a genetic algorithm that automatically builds a CJM from raw customer experience recorded in a database. Mining representative journeys can be seen a clustering task where both the sequence of activities and some contextual data (e.g., demographics) are considered when measuring the similarity between journeys. We show that our genetic approach outperforms traditional ways of handling this clustering task. Moreover, we apply our algorithm on a real dataset to highlight the benefit of using a genetic approach

Crossref

Serveur académique lausannois